Top_Keyword: An Aggregation Function for Textual Document OLAP

نویسندگان

  • Franck Ravat
  • Olivier Teste
  • Ronan Tournier
  • Gilles Zurfluh
چکیده

For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new aggregation function that aggregates textual data in an OLAP environment. The TOP_KEYWORD function (TOP_KW for short) represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Olap aggregation function for textual data warehouse

For more than a decade, OLAP and multidimensional analysis have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of semistructured data there is a need for incorporating text-rich document data in a data warehouse and providing adapted multidimensional analysis. This paper presents a new aggregation function for keywo...

متن کامل

OLAP textual aggregation approach using the Google similarity distance

Data warehousing and On-Line Analytical Processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregat...

متن کامل

Multidimensional Anlaysis of XML Document Contents with OLAP Dimensions

With the emergence of Semi-structured data format (such as XML), the storage of documents in centralised facilities appeared as a natural adaptation of data warehousing technology. Nowadays, OLAP (On-Line Analytical Processing) systems face growing non-numeric data. This chapter presents a framework for the multidimensional analysis of textual data in an OLAP sense. Document structure, metadata...

متن کامل

A Formal Framework of Aggregation for the OLAP-OLTP Model

OLAP applications are widely used in business applications. They are often (implicitly) defined on top of OLTP systems and extensively use aggregation and transformation functions. The main OLAP data structure is a multidimensional table with three kinds of attributes: so-called dimension attributes, implicit attributes given by aggregation functions and fact attributes. Domains of dimension at...

متن کامل

Meta-Stars: Dynamic, Schemaless, and Semantically-Rich Topic Hierarchies in Social BI

A key role in OLAP analyses of textual user-generated content for social business intelligence (SBI) is played by topics, i.e., concepts of interest within a subject area. Topic hierarchies are irregular, heterogeneous, dynamic, and possibly schemaless; besides, unlike in traditional OLAP, di↵erent semantics for topic aggregation can be envisioned. In this demonstration we present an architectu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008